Skip to main content

All Questions

2votes
1answer
69views

Taking into account instance cost in learning?

I am generally trying to take into account costs in learning. The set-up is as follows: a statistical learning problem with usuall X and y, where y is imbalanced (roughly 1% of ones). Scikit learn ...
Lucas Morin's user avatar
0votes
1answer
136views

Imbalanced Cost-Sensitive Learning Workflow - How to split the data, tune hyperparameters and apply adecision threshold?

I am facing a problem with imbalanced dataset in which I would like to detect the rare event. My questions are more of general strategy about the whole workflow and I would like to hear your thoughts ...
GeorgeM's user avatar
-1votes
1answer
61views

How to deal with a heavily imbalanced test dataset?

Both my train data and test data were imbalanced. So I tried SMOTE for training. Before Smote: ...
GrGr11's user avatar
4votes
2answers
2kviews

Flipping the labels in a binary classification gives different model and results

I have an imbalanced dataset and I want to train a binary classifier to model the dataset. Here was my approach which resulted into (relatively) acceptable performance: 1- I made a random split to get ...
Farzad's user avatar
1vote
0answers
1kviews

Downsampling in sklearn. Test and Train performance question

I have a class imbalanced data set, and have the following set up to handle class imbalance. I first split to test and train and only perform downsampling on the training set and then get the test ...
bananaboy's user avatar
1vote
2answers
641views

Evaluation Metric for Imbalanced and Ordinal Classification

I'm looking for an ML evaluation metric that would work well with imbalanced and ordinal multiclass datasets: Imagine you want to predict the severity of a disease that has 4 grades of severity where ...
Fabio Magarelli's user avatar
2votes
1answer
2kviews

Imbalanced data set with Sample weighting - How to interpret the performance metrics?

Consider a binary classification scenario whereby the True class (5%) is severely outbalanced to the False class (95%). My data set contains numeric data. I am using SKLearn and trying some different ...
Jurgen Cuschieri's user avatar
0votes
1answer
1kviews

roc_auc_score from sk-learn gives error when test label vector with classes has only a subset of the whole set

I have an imbalanced dataset. Does it make sense to compute the roc-auc for the classifier I created in a holdout set? Here's very artificial MWE: ...
An old man in the sea.'s user avatar
1vote
1answer
251views

Imbalanced classification task – Discrepancy between learning curves and test set evaluation

I have a binary classification task related to customer churn for a bank. The dataset contains 10,000 instances and 11 features. The target variable is imbalanced (80% remained as customers (0), 20% ...
KK_o7's user avatar
2votes
1answer
367views

Training is not stable with extreme class imbalance

I'm dealing with a multi-class classification problem with around 30 categories. This problem has a severe class imbalance: Around 300 examples for the least common class. Around 100k examples for ...
David Masip's user avatar
0votes
1answer
707views

Logistic regression with unbalanced data, scoring based only on rare class

I have a dataset off app. 600.000 data points in which 0.2% (1.200 samples) is labelled as signifying a rare event. I want to use logistic regression to help me predict this rare event, but even when ...
Nick W's user avatar
0votes
1answer
28views

Unbalanced training set from balanced data

I am looking to get an unbalanced training set with a given ratio of classA:classB from a dataset without regarding if it is balanced or not. The point is to analyze the influence of data imbalance on ...
jelczyn's user avatar
0votes
1answer
2kviews

How does class_weight work in Decision Tree?

I am interested in Cost-Sensitive learning. And I am trying to understand how class_weight in DecisionTree works in terms of math. I read a lot of articles that ...
Marni's user avatar
0votes
2answers
974views

GridSearch on imbalanced datasets

Im trying to use gridsearch to find the best parameter for my model. Knowing that I have to implement nearmiss undersampling method while doing cross validation, should I fit my gridsearch on my ...
Valentin's user avatar

153050per page
close